Leveraging locality for topic identification of conversational speech

نویسنده

  • Jonathan Wintrode
چکیده

We evaluate the limitations of the bag-of-words assumption for topic identification of conversational discourse by examining whether topic-dependent word occurrence statistics are also position-independent. We demonstrate where the assumption is violated in conversational speech corpora and show how the relevance of words to the classification task decreases over the length of the document. We seek to improve topic identification by modeling this topic drift phenomenon and weight word counts according to a decay function over the length of the document. By applying a global decay rate for all words we observe reduction in error rates of 23-47% relative on conversational corpora. Furthermore, we apply a minimum classification error (MCE) training procedure to learn per-word decay rates, and reduce error rates by up to an additional 27%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Confidence-Based Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech

We investigate the impact of automatic speech recognition errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF featureweighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice outputs using one reco...

متن کامل

Techniques for rapid and robust topic identification of conversational telephone speech

In this paper, we investigate the impact of automatic speech recognition (ASR) errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF feature weighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice ...

متن کامل

A Boosting Approach to Topic Spotting on Subdialogues

We report the results of a study on topic spotting in conversational speech. Using a machine learning approach, we build classifiers that accept an audio file of conversational human speech as input, and output an estimate of the topic being discussed. Our methodology makes use of a wellknown corpus of transcribed and topic-labeled speech (the Switchboard corpus), and involves an interesting do...

متن کامل

Topic Identification from Audio Recordings Using Rich Recognition Results and Neural Network Based Classifiers

This paper investigates the use of a Neural Network classifier for topic identification from conversational telephone speech, which exploits rich recognition results coming from an automatic speech recognizer. The baseline features used to feed the neural classifier are produced using the words extracted from the 1-best sequence. Rich recognition results include the word union of the first n-be...

متن کامل

Topic Learning in Text and Conversational Speech

Topic Learning in Text and Conversational Speech

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013